97 research outputs found
Flowing ConvNets for Human Pose Estimation in Videos
The objective of this work is human pose estimation in videos, where multiple
frames are available. We investigate a ConvNet architecture that is able to
benefit from temporal context by combining information across the multiple
frames using optical flow.
To this end we propose a network architecture with the following novelties:
(i) a deeper network than previously investigated for regressing heatmaps; (ii)
spatial fusion layers that learn an implicit spatial model; (iii) optical flow
is used to align heatmap predictions from neighbouring frames; and (iv) a final
parametric pooling layer which learns to combine the aligned heatmaps into a
pooled confidence map.
We show that this architecture outperforms a number of others, including one
that uses optical flow solely at the input layers, one that regresses joint
coordinates directly, and one that predicts heatmaps without spatial fusion.
The new architecture outperforms the state of the art by a large margin on
three video pose estimation datasets, including the very challenging Poses in
the Wild dataset, and outperforms other deep methods that don't use a graphical
model on the single-image FLIC benchmark (and also Chen & Yuille and Tompson et
al. in the high precision region).Comment: ICCV'1
RL-LIM: Reinforcement Learning-based Locally Interpretable Modeling
Understanding black-box machine learning models is important towards their
widespread adoption. However, developing globally interpretable models that
explain the behavior of the entire model is challenging. An alternative
approach is to explain black-box models through explaining individual
prediction using a locally interpretable model. In this paper, we propose a
novel method for locally interpretable modeling - Reinforcement Learning-based
Locally Interpretable Modeling (RL-LIM). RL-LIM employs reinforcement learning
to select a small number of samples and distill the black-box model prediction
into a low-capacity locally interpretable model. Training is guided with a
reward that is obtained directly by measuring agreement of the predictions from
the locally interpretable model with the black-box model. RL-LIM near-matches
the overall prediction performance of black-box models while yielding
human-like interpretability, and significantly outperforms state of the art
locally interpretable models in terms of overall prediction performance and
fidelity.Comment: 18 pages, 7 figures, 7 table
- …